Fix: bug(gateguard): race condition in session state read-modify-write path#1442
Fix: bug(gateguard): race condition in session state read-modify-write path#1442shahyashish wants to merge 1 commit intoaffaan-m:mainfrom
Conversation
…ate read-modify-write path
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughUpdated the Changes
Possibly related issues
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR replaces the simple overwrite in
Confidence Score: 4/5Improvement is real but the race condition the PR claims to fix is only narrowed, not eliminated; address before merging if correctness of concurrent writes is a hard requirement. One P1 finding: the TOCTOU window between the merge-read and the rename persists, meaning the 'ensures additive' guarantee in the PR description does not hold under tight concurrency. The fix is a significant improvement in practice (window shrinks from call-stack duration to disk I/O time), but the stated correctness claim is overstated. scripts/hooks/gateguard-fact-force.js — specifically the saveState merge window (lines 86–108) and the pruneStaleFiles filter (line 133) Important Files Changed
Sequence DiagramsequenceDiagram
participant PA as Process A (saveState)
participant FS as File System
participant PB as Process B (saveState)
Note over PA,PB: Happy path — fix works (serial reads)
PA->>FS: readFileSync(STATE_FILE) → {checked:[]}
PA->>FS: renameSync(tmp_A) → {checked:["file_A"]}
PB->>FS: readFileSync(STATE_FILE) → {checked:["file_A"]}
PB->>FS: renameSync(tmp_B) → {checked:["file_A","file_B"]}
Note over PA,PB: Remaining race — both read before either writes
PA->>FS: readFileSync(STATE_FILE) → {checked:[]}
PB->>FS: readFileSync(STATE_FILE) → {checked:[]}
PA->>FS: renameSync(tmp_A) → {checked:["file_A"]}
PB->>FS: renameSync(tmp_B) → {checked:["file_B"]}
Note over FS: file_A silently lost
|
| try { | ||
| if (fs.existsSync(STATE_FILE)) { | ||
| const disk = JSON.parse(fs.readFileSync(STATE_FILE, 'utf8')); | ||
| if (Array.isArray(disk.checked)) { | ||
| // Union of checked items, preserving order (disk items first, then memory items) | ||
| mergedChecked = Array.from(new Set([...disk.checked, ...mergedChecked])); | ||
| } | ||
| if (typeof disk.last_active === 'number') { | ||
| mergedLastActive = Math.max(mergedLastActive, disk.last_active); | ||
| } | ||
| } | ||
| } catch (_) { /* ignore disk read errors, proceed with memory state */ } | ||
|
|
||
| // 3. Finalize state: update heartbeat and prune to bounded size | ||
| const finalState = { | ||
| checked: pruneCheckedEntries(mergedChecked), | ||
| last_active: Math.max(mergedLastActive, Date.now()) | ||
| }; | ||
|
|
||
| // 4. Atomic write: temp file + rename prevents partial reads | ||
| const tmpFile = STATE_FILE + '.tmp.' + process.pid + '.' + Math.random().toString(36).slice(2, 6); | ||
| fs.writeFileSync(tmpFile, JSON.stringify(finalState, null, 2), 'utf8'); | ||
| fs.renameSync(tmpFile, STATE_FILE); |
There was a problem hiding this comment.
TOCTOU window still present between merge-read and rename
The merge read (readFileSync on line 88) and the atomic rename (line 108) are separate, non-atomic operations. If two processes both complete their readFileSync before either calls renameSync, the second rename will silently overwrite the first's new entries. The PR description says this "ensures that checks... are additive rather than destructive," but that guarantee only holds when one process completes its rename before the other's inner read — a timing assumption that cannot be relied upon.
Concrete failure path:
- Process A reads disk →
{checked: []} - Process B reads disk →
{checked: []}(before A writes) - Process A renames →
["file_A"]persisted - Process B renames →
["file_B"]persisted,"file_A"silently lost
The window is now the disk I/O span of lines 87–97 rather than the entire loadState→saveState call chain, which is a meaningful improvement, but the window is not zero. Truly eliminating it would require an OS-level exclusive lock (e.g. lockfile / proper-lockfile) around the read-modify-write, or a retry loop that re-reads after the rename and merges again if the file changed.
| } | ||
|
|
||
| module.exports = { run }; | ||
| module.exports = { run }; No newline at end of file |
There was a problem hiding this comment.
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="scripts/hooks/gateguard-fact-force.js">
<violation number="1" location="scripts/hooks/gateguard-fact-force.js:88">
P1: Concurrent saveState calls can still lose updates because merge-style read-modify-write is not synchronized across processes.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| // 2. Merge with current disk state to prevent overwriting concurrent updates (Option 2: Merge-style saves) | ||
| try { | ||
| if (fs.existsSync(STATE_FILE)) { | ||
| const disk = JSON.parse(fs.readFileSync(STATE_FILE, 'utf8')); |
There was a problem hiding this comment.
P1: Concurrent saveState calls can still lose updates because merge-style read-modify-write is not synchronized across processes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/hooks/gateguard-fact-force.js, line 88:
<comment>Concurrent saveState calls can still lose updates because merge-style read-modify-write is not synchronized across processes.</comment>
<file context>
@@ -76,14 +76,37 @@ function pruneCheckedEntries(checked) {
+ // 2. Merge with current disk state to prevent overwriting concurrent updates (Option 2: Merge-style saves)
+ try {
+ if (fs.existsSync(STATE_FILE)) {
+ const disk = JSON.parse(fs.readFileSync(STATE_FILE, 'utf8'));
+ if (Array.isArray(disk.checked)) {
+ // Union of checked items, preserving order (disk items first, then memory items)
</file context>
|
@shahyashish — thanks for picking up #1441 so quickly. Heads up per #1453: the merge-style rewrite in your new `saveState` dropped the Windows `EEXIST` / `EPERM` unlink-before-rename branch that currently lives in `main` (landed as part of #1439's GateGuard consolidation). `fs.renameSync` throws `EEXIST` on Windows when the target already exists, so without the unlink fallback the second write onwards silently no-ops inside the catch-all and GateGuard stops seeing checked files. The branch to preserve: ```js FWIW #1450 hit a parallel shape on the Python side — three Windows-specific pitfalls in a row (launcher-as-cmd-builtin → cmd.exe re-parsing → argv quoting) before landing on `explorer.exe`. Cross-platform edge cases tend to come in layers; might be worth a full Windows smoke pass on the merged state before ship. |
The fix addresses a race condition in the session state read-modify-write path by implementing a 'merge-style save' strategy. Previously,
saveStatewould simply overwrite the session file with the current in-memory snapshot, which could lead to data loss if two concurrent hook processes were running for the same session. Now,saveStateperforms a raw read of the current on-disk state immediately before writing, merges it with the in-memory state (creating a union of 'checked' keys and taking the maximum 'last_active' timestamp), and then performs an atomic write via a temp file. This ensures that checks performed by parallel processes are additive rather than destructive. A random suffix was also added to the temp file path to further reduce collision risks.Test: 1. Set
CLAUDE_SESSION_ID=test_race. 2. Run two instances of the hook script nearly simultaneously, each attempting tomarkCheckeda different file (e.g.,file_Aandfile_B). 3. Verify the resulting session JSON file in the state directory (.gateguard/state-test_race.json) contains bothfile_Aandfile_Bin itscheckedarray, regardless of which process finished first.Summary by CodeRabbit
Summary by cubic
Fixes a race condition in GateGuard session saves by merging on-disk and in-memory state before an atomic write, preventing lost
checkedentries during parallel runs. Fixes #1441.saveStatenow does a merge-style save: unionscheckedand takes the maxlast_activewith the on-disk state.Written for commit 4cf12e8. Summary will update on new commits.